Bond, Francis, Timothy Baldwin, Richard Fothergill and Kiyotaka Uchimoto (2012) Japanese SemCor: A Sense-tagged Corpus of Japanese, In Proceedings of the 6th International Global Wordnet Conference (GWC 2012), Matsue, Japan

نویسندگان

  • Francis Bond
  • Timothy Baldwin
  • Richard Fothergill
  • Kiyotaka Uchimoto
چکیده

In this paper we describe the creation of the Japanese SemCor (JSEMCOR) sensetagged corpus of Japanese. The corpus is a translation of the English SEMCOR, with senses projected across from English. The final corpus consists of 14,169 sentences with 150,555 content words of which 58,265 are sense tagged. The corpus is one of the corpora used to provide sense frequency data for the Japanese Wordnet.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development of the Japanese WordNet

After a long history of compilation of our own lexical resources, EDR Japanese/English Electronic Dictionary, and discussions with major players on development of various WordNets, Japanese National Institute of Information and Communications Technology started developing the Japanese WordNet in 2006 and will publicly release the first version, which includes both the synset in Japanese and the...

متن کامل

Enhancing the Japanese WordNet

The Japanese WordNet currently has 51,000 synsets with Japanese entries. In this paper, we discuss three methods of extending it: increasing the cover, linking it to examples in corpora and linking it to other resources (SUMO and GoiTaikei). In addition, we outline our plans to make it more useful by adding Japanese definition sentences to each synset. Finally, we discuss how releasing the corp...

متن کامل

"PolNet - Polish WordNet" project: PolNet 2.0 - a short description of the release

In December 2011/January 2012 we have released the main deliverable of the project "PolNet Polish WordNet". It was first presented and distributed (as PolNet 1.0) at the 5th Language and Technology Conference in Poznań (2011) and (informally, with kind permission of the organizers) distributed during the Global Wordnet Conference in Matsue, Japan, in January 2012. We intend to present to the pa...

متن کامل

Boot-Strapping a WordNet Using Multiple Existing WordNets

In this paper we describe the construction of an illustrated Japanese Wordnet. We bootstrap the Wordnet using existing multiple existing wordnets in order to deal with the ambiguity inherent in translation. We illustrate it with pictures from the Open Clip Art Library.

متن کامل

Baldwin, Timothy, Su Nam Kim, Francis Bond, Sanae Fujita, David Martinez and Takaaki Tanaka (2008) MRD-based Word Sense Disambiguation: Further Extending Lesk, In Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP 2008), Hyderabad, India

This paper reconsiders the task of MRDbased word sense disambiguation, in extending the basic Lesk algorithm to investigate the impact onWSD performance of different tokenisation schemes, scoring mechanisms, methods of gloss extension and filtering methods. In experimentation over the Lexeed Sensebank and the Japanese Senseval2 dictionary task, we demonstrate that character bigrams with sense-s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012